AITopics | data manipulation

Collaborating Authors

data manipulation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Data Manipulation for Augmentation and Weighting

Zhiting Hu, Bowen Tan, Russ R. Salakhutdinov, Tom M. Mitchell, Eric P. Xing

Neural Information Processing SystemsFeb-19-2026, 15:23:29 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, data augmentation, manipulation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(2 more...)

Add feedback

6c1e55ec7c43dc51a37472ddcbd756fb-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 19:14:58 GMT

algorithm, data provider, learner, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.05)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Learning Data Manipulation for Augmentation and Weighting

Zhiting Hu, Bowen Tan, Russ R. Salakhutdinov, Tom M. Mitchell, Eric P. Xing

Neural Information Processing SystemsOct-9-2025, 14:19:18 GMT

Manipulating data, such as weighting data examples or augmenting with new instances, has been increasingly used to improve model training.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
(2 more...)

Add feedback

Query-Efficient Adversarial Attack Against Vertical Federated Graph Learning

Chen, Jinyin, Mu, Wenbo, Zhang, Luxin, Huang, Guohan, Zheng, Haibin, Cheng, Yao

arXiv.org Artificial IntelligenceNov-4-2024

Graph neural network (GNN) has captured wide attention due to its capability of graph representation learning for graph-structured data. However, the distributed data silos limit the performance of GNN. Vertical federated learning (VFL), an emerging technique to process distributed data, successfully makes GNN possible to handle the distributed graph-structured data. Despite the prosperous development of vertical federated graph learning (VFGL), the robustness of VFGL against the adversarial attack has not been explored yet. Although numerous adversarial attacks against centralized GNNs are proposed, their attack performance is challenged in the VFGL scenario. To the best of our knowledge, this is the first work to explore the adversarial attack against VFGL. A query-efficient hybrid adversarial attack framework is proposed to significantly improve the centralized adversarial attacks against VFGL, denoted as NA2, short for Neuron-based Adversarial Attack. Specifically, a malicious client manipulates its local training data to improve its contribution in a stealthy fashion. Then a shadow model is established based on the manipulated data to simulate the behavior of the server model in VFGL. As a result, the shadow model can improve the attack success rate of various centralized attacks with a few queries. Extensive experiments on five real-world benchmarks demonstrate that NA2 improves the performance of the centralized adversarial attacks against VFGL, achieving state-of-the-art performance even under potential adaptive defense where the defender knows the attack method. Additionally, we provide interpretable experiments of the effectiveness of NA2 via sensitive neurons identification and visualization of t-SNE.

artificial intelligence, machine learning, shadow model, (16 more...)

arXiv.org Artificial Intelligence

2411.02809

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > Singapore (0.04)
Asia > Japan (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Truthful Dataset Valuation by Pointwise Mutual Information

Zheng, Shuran, Kwon, Yongchan, Qi, Xuan, Zou, James

arXiv.org Artificial IntelligenceMay-28-2024

A common way to evaluate a dataset in ML involves training a model on this dataset and assessing the model's performance on a test set. However, this approach has two issues: (1) it may incentivize undesirable data manipulation in data marketplaces, as the self-interested data providers seek to modify the dataset to maximize their evaluation scores; (2) it may select datasets that overfit to potentially small test sets. We propose a new data valuation method that provably guarantees the following: data providers always maximize their expected score by truthfully reporting their observed data. Any manipulation of the data, including but not limited to data duplication, adding random data, data removal, or re-weighting data from different groups, cannot increase their expected score. Our method, following the paradigm of proper scoring rules, measures the pointwise mutual information (PMI) of the test dataset and the evaluated dataset. However, computing the PMI of two datasets is challenging. We introduce a novel PMI measuring method that greatly improves tractability within Bayesian machine learning contexts. This is accomplished through a new characterization of PMI that relies solely on the posterior probabilities of the model parameter at an arbitrarily selected value. Finally, we support our theoretical results with simulations and further test the effectiveness of our data valuation method in identifying the top datasets among multiple data providers. Interestingly, our method outperforms the standard approach of selecting datasets based on the trained model's test performance, suggesting that our truthful valuation score can also be more robust to overfitting.

data provider, dataset, pmi score, (13 more...)

arXiv.org Artificial Intelligence

2405.18253

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
Europe > France (0.04)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Enhancement attacks in biomedical machine learning

Rosenblatt, Matthew, Dadashkarimi, Javid, Scheinost, Dustin

arXiv.org Artificial IntelligenceAug-16-2023

The prevalence of machine learning in biomedical research is rapidly growing, yet the trustworthiness of such research is often overlooked. While some previous works have investigated the ability of adversarial attacks to degrade model performance in medical imaging, the ability to falsely improve performance via recently-developed "enhancement attacks" may be a greater threat to biomedical machine learning. In the spirit of developing attacks to better understand trustworthiness, we developed two techniques to drastically enhance prediction performance of classifiers with minimal changes to features: 1) general enhancement of prediction performance, and 2) enhancement of a particular method over another. Our enhancement framework falsely improved classifiers' accuracy from 50% to almost 100% while maintaining high feature similarities between original and enhanced data (Pearson's r's>0.99). Similarly, the method-specific enhancement framework was effective in falsely improving the performance of one method over another. For example, a simple neural network outperformed logistic regression by 17% on our enhanced dataset, although no performance differences were present in the original dataset. Crucially, the original and enhanced data were still similar (r=0.99). Our results demonstrate the feasibility of minor data manipulations to achieve any desired prediction performance, which presents an interesting ethical challenge for the future of biomedical machine learning. These findings emphasize the need for more robust data provenance tracking and other precautionary measures to ensure the integrity of biomedical machine learning research.

artificial intelligence, enhancement attack, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2301.01885

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Roadmap To getting into Data Science.

#artificialintelligenceFeb-18-2023, 10:35:32 GMT

Getting started with data science can be a confusing journey, especially if the person is not from the STEM field. In this article, I explore and define the essential aspects of data science you need to get started correctly. This article will mainly tackle the technical skills required for a data scientist. To become a data scientist, you need to be familiar with programming, statistics, and machine learning. This article will outline the steps you can take to become a data scientist and the important libraries you need to know.

data science, data scientist, library, (12 more...)

#artificialintelligence

Industry: Education > Curriculum > Subject-Specific Education (0.39)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.38)

Add feedback

Top five Essential Skills to Master in Artificial Intelligence

#artificialintelligenceJan-1-2023, 06:25:06 GMT

Artificial intelligence (AI) is rapidly transforming industries around the world, from health care to finance to retail. As AI becomes more prevalent, the demand for professionals with AI skills is just expected to grow. In this blog post, we are going to highlight five essential skills that every AI professional should master in order to succeed in this rapidly evolving field. From machine learning and deep learning to data manipulation and problem-solving, these skills will give you the foundation you need to build and work with AI systems. Thus let's dive in and explore these essential AI skills in additional detail.

foundation, learning, machine learning, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Add feedback

Python for Data Science: A Look at the Top Libraries

#artificialintelligenceDec-30-2022, 13:10:49 GMT

Python is a popular language for data science due to its powerful libraries and tools for data manipulation, visualization, machine learning, and statistical analysis. In this listicle, we will introduce some of the top Python libraries for data science and provide a quick and cool way to get started with them. NumPy is a library for working with large, multi-dimensional arrays and matrices of numerical data. It provides functions for performing mathematical operations on arrays, such as linear algebra, statistical analysis, and random number generation. It provides functions for reading in data from various sources, cleaning and wrangling data, and performing aggregations and transformations. Matplotlib is a library for creating static, animated, and interactive visualizations in Python.

library, mathematical operation, python, (13 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)

Add feedback